atcomm - Tumblr blog

atcomm · 14 years ago

Text

Programs and Processes

Program

A program is an executable file residing on disk in a directory. A program is read into memory and is executed by the kernel as a result of one of the six exec functions.

Processes and Process ID

An executing instance of a program is called a process, a term used on almost every page of this text. Some operating systems use the term task to refer to a program that is being executed.

The UNIX System guarantees that every process has a unique numeric identifier called the process ID. The process ID is always a non-negative integer.

Example

The program in Figure 1.6 prints its process ID.

If we compile this program into the file a.out and execute it, we have

$ ./a.out hello world from process ID 851 $ ./a.out hello world from process ID 854

When this program runs, it calls the function getpid to obtain its process ID.

Figure 1.6. Print the process ID

#include "apue.h" int main(void) { printf("hello world from process ID %d\n", getpid()); exit(0); }

Process Control

There are three primary functions for process control: fork, exec, and waitpid. (The exec function has six variants, but we often refer to them collectively as simply the exec function.)

Example

The process control features of the UNIX System are demonstrated using a simple program (Figure 1.7) that reads commands from standard input and executes the commands. This is a bare-bones implementation of a shell-like program. There are several features to consider in this 30-line program.

We use the standard I/O function fgets to read one line at a time from the standard input. When we type the end-of-file character (which is often Control-D) as the first character of a line, fgets returns a null pointer, the loop stops, and the process terminates.

Because each line returned by fgets is terminated with a newline character, followed by a null byte, we use the standard C function strlen to calculate the length of the string, and then replace the newline with a null byte. We do this because the execlp function wants a null-terminated argument, not a newline-terminated argument.

We call fork to create a new process, which is a copy of the caller. We say that the caller is the parent and that the newly created process is the child. Then fork returns the non-negative process ID of the new child process to the parent, and returns 0 to the child. Because fork creates a new process, we say that it is called onceby the parentbut returns twicein the parent and in the child.

In the child, we call execlp to execute the command that was read from the standard input. This replaces the child process with the new program file. The combination of a fork, followed by an exec, is what some operating systems call spawning a new process. In the UNIX System, the two parts are separated into individual functions.

Because the child calls execlp to execute the new program file, the parent wants to wait for the child to terminate. This is done by calling waitpid, specifying which process we want to wait for: the pid argument, which is the process ID of the child. The waitpid function also returns the termination status of the childthe status variablebut in this simple program, we don't do anything with this value. We could examine it to determine exactly how the child terminated.

The most fundamental limitation of this program is that we can't pass arguments to the command that we execute. We can't, for example, specify the name of a directory to list. We can execute ls only on the working directory. To allow arguments would require that we parse the input line, separating the arguments by some convention, probably spaces or tabs, and then pass each argument as a separate argument to the execlp function. Nevertheless, this program is still a useful demonstration of the process control functions of the UNIX System.

If we run this program, we get the following results. Note that our program has a different promptthe percent signto distinguish it from the shell's prompt.

$ ./a.out % date Sun Aug 1 03:04:47 EDT 2004 programmers work late % who sar :0 Jul 26 22:54 sar pts/0 Jul 26 22:54 (:0) sar pts/1 Jul 26 22:54 (:0) sar pts/2 Jul 26 22:54 (:0) % pwd /home/sar/bk/apue/2e % ls Makefile a.out shell1.c % ^D type the end-of-file character $ the regular shell prompt

Figure 1.7. Read commands from standard input and execute them

#include "apue.h" #include <sys/wait.h> int main(void) { char buf[MAXLINE]; /* from apue.h */ pid_t pid; int status; printf("%% "); /* print prompt (printf requires %% to print %) */ while (fgets(buf, MAXLINE, stdin) != NULL) { if (buf[strlen(buf) - 1] == "\n") buf[strlen(buf) - 1] = 0; /* replace newline with null */ if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* child */ execlp(buf, buf, (char *)0); err_ret("couldn't execute: %s", buf); exit(127); } /* parent */ if ((pid = waitpid(pid, &status, 0)) < 0) err_sys("waitpid error"); printf("%% "); } exit(0); }

The notation ^D is used to indicate a control character. Control characters are special characters formed by holding down the control keyoften labeled Control or Ctrlon your keyboard and then pressing another key at the same time. Control-D, or ^D, is the default end-of-file character.

Threads and Thread IDs

Usually, a process has only one thread of controlone set of machine instructions executing at a time. Some problems are easier to solve when more than one thread of control can operate on different parts of the problem. Additionally, multiple threads of control can exploit the parallelism possible on multiprocessor systems.

All the threads within a process share the same address space, file descriptors, stacks, and process-related attributes. Because they can access the same memory, the threads need to synchronize access to shared data among themselves to avoid inconsistencies.

As with processes, threads are identified by IDs. Thread IDs, however, are local to a process. A thread ID from one process has no meaning in another process. We use thread IDs to refer to specific threads as we manipulate the threads within a process.

Functions to control threads parallel those used to control processes. Because threads were added to the UNIX System long after the process model was established, however, the thread model and the process model have some complicated interactions.

ERROR HANDLING

When an error occurs in one of the UNIX System functions, a negative value is often returned, and the integer errno is usually set to a value that gives additional information. For example, the open function returns either a non-negative file descriptor if all is OK or 1 if an error occurs. An error from open has about 15 possible errno values, such as file doesn't exist, permission problem, and so on. Some functions use a convention other than returning a negative value. For example, most functions that return a pointer to an object return a null pointer to indicate an error.

The file <errno.h> defines the symbol errno and constants for each value that errno can assume. Each of these constants begins with the character E. Also, the first page of Section 2 of the UNIX system manuals, named intro(2), usually lists all these error constants. For example, if errno is equal to the constant EACCES, this indicates a permission problem, such as insufficient permission to open the requested file.

On Linux, the error constants are listed in the errno(3) manual page.

POSIX and ISO C define errno as a symbol expanding into a modifiable lvalue of type integer. This can be either an integer that contains the error number or a function that returns a pointer to the error number. The historical definition is

extern int errno;

But in an environment that supports threads, the process address space is shared among multiple threads, and each thread needs its own local copy of errno to prevent one thread from interfering with another. Linux, for example, supports multithreaded access to errno by defining it as

extern int *_ _errno_location(void); #define errno (*_ _errno_location())

There are two rules to be aware of with respect to errno. First, its value is never cleared by a routine if an error does not occur. Therefore, we should examine its value only when the return value from a function indicates that an error occurred. Second, the value of errno is never set to 0 by any of the functions, and none of the constants defined in <errno.h> has a value of 0.

Two functions are defined by the C standard to help with printing error messages.

#include <string.h> char *strerror(int errnum);

Returns: pointer to message string

This function maps errnum, which is typically the errno value, into an error message string and returns a pointer to the string.

The perror function produces an error message on the standard error, based on the current value of errno, and returns.

#include <stdio.h> void perror(const char *msg);

It outputs the string pointed to by msg, followed by a colon and a space, followed by the error message corresponding to the value of errno, followed by a newline.

Example

Figure 1.8 shows the use of these two error functions.

If this program is compiled into the file a.out, we have

$ ./a.out EACCES: Permission denied ./a.out: No such file or directory

Note that we pass the name of the programargv[0], whose value is ./a.outas the argument to perror. This is a standard convention in the UNIX System. By doing this, if the program is executed as part of a pipeline, as in

prog1 < inputfile | prog2 | prog3 > outputfile

we are able to tell which of the three programs generated a particular error message.

Figure 1.8. Demonstrate strerror and perror

#include "apue.h" #include <errno.h> int main(int argc, char *argv[]) { fprintf(stderr, "EACCES: %s\n", strerror(EACCES)); errno = ENOENT; perror(argv[0]); exit(0); }

Instead of calling either strerror or perror directly, all the examples in this text use the error functions. The error functions let us use the variable argument list facility of ISO C to handle error conditions with a single C statement.

Error Recovery

The errors defined in <errno.h> can be divided into two categories: fatal and nonfatal. A fatal error has no recovery action. The best we can do is print an error message on the user's screen or write an error message into a log file, and then exit. Nonfatal errors, on the other hand, can sometimes be dealt with more robustly. Most nonfatal errors are temporary in nature, such as with a resource shortage, and might not occur when there is less activity on the system.

Resource-related nonfatal errors include EAGAIN, ENFILE, ENOBUFS, ENOLCK, ENOSPC, ENOSR, EWOULDBLOCK, and sometimes ENOMEM. EBUSY can be treated as a nonfatal error when it indicates that a shared resource is in use. Sometimes, EINTR can be treated as a nonfatal error when it interrupts a slow system call.

The typical recovery action for a resource-related nonfatal error is to delay a little and try again later. This technique can be applied in other circumstances. For example, if an error indicates that a network connection is no longer functioning, it might be possible for the application to delay a short time and then reestablish the connection. Some applications use an exponential backoff algorithm, waiting a longer period of time each iteration.

Ultimately, it is up to the application developer to determine which errors are recoverable. If a reasonable strategy can be used to recover from an error, we can improve the robustness of our application by avoiding an abnormal exit.

USER IDENTIFICATION

User ID

The user ID from our entry in the password file is a numeric value that identifies us to the system. This user ID is assigned by the system administrator when our login name is assigned, and we cannot change it. The user ID is normally assigned to be unique for every user. We'll see how the kernel uses the user ID to check whether we have the appropriate permissions to perform certain operations.

We call the user whose user ID is 0 either root or the superuser. The entry in the password file normally has a login name of root, and we refer to the special privileges of this user as superuser privileges. If a process has superuser privileges, most file permission checks are bypassed. Some operating system functions are restricted to the superuser. The superuser has free rein over the system.

Client versions of Mac OS X ship with the superuser account disabled; server versions ship with the account already enabled. Instructions are available on Apple's Web site describing how to enable it. See http://docs.info.apple.com/article.html?artnum=106290.

Group ID

Our entry in the password file also specifies our numeric group ID. This too is assigned by the system administrator when our login name is assigned. Typically, the password file contains multiple entries that specify the same group ID. Groups are normally used to collect users together into projects or departments. This allows the sharing of resources, such as files, among members of the same group. We can set the permissions on a file so that all members of a group can access the file, whereas others outside the group cannot.

There is also a group file that maps group names into numeric group IDs. The group file is usually /etc/group.

The use of numeric user IDs and numeric group IDs for permissions is historical. With every file on disk, the file system stores both the user ID and the group ID of a file's owner. Storing both of these values requires only four bytes, assuming that each is stored as a two-byte integer. If the full ASCII login name and group name were used instead, additional disk space would be required. In addition, comparing strings during permission checks is more expensive than comparing integers.

Users, however, work better with names than with numbers, so the password file maintains the mapping between login names and user IDs, and the group file provides the mapping between group names and group IDs. The ls -l command, for example, prints the login name of the owner of a file, using the password file to map the numeric user ID into the corresponding login name.

Early UNIX systems used 16-bit integers to represent user and group IDs. Contemporary UNIX systems use 32-bit integers.

Example

The program in Figure 1.9 prints the user ID and the group ID.

We call the functions getuid and getgid to return the user ID and the group ID. Running the program yields

$ ./a.out uid = 205, gid = 105

Figure 1.9. Print user ID and group ID

#include "apue.h" int main(void) { printf("uid = %d, gid = %d\n", getuid(), getgid()); exit(0); }

Supplementary Group IDs

In addition to the group ID specified in the password file for a login name, most versions of the UNIX System allow a user to belong to additional groups. This started with 4.2BSD, which allowed a user to belong to up to 16 additional groups. These supplementary group IDs are obtained at login time by reading the file /etc/group and finding the first 16 entries that list the user as a member. As we shall see in the next chapter, POSIX requires that a system support at least eight supplementary groups per process, but most systems support at least 16.

SIGNALS

Signals are a technique used to notify a process that some condition has occurred. For example, if a process divides by zero, the signal whose name is SIGFPE (floating-point exception) is sent to the process. The process has three choices for dealing with the signal.

Ignore the signal. This option isn't recommended for signals that denote a hardware exception, such as dividing by zero or referencing memory outside the address space of the process, as the results are undefined.

Let the default action occur. For a divide-by-zero condition, the default is to terminate the process.

Provide a function that is called when the signal occurs (this is called "catching" the signal). By providing a function of our own, we'll know when the signal occurs and we can handle it as we wish.

Many conditions generate signals. Two terminal keys, called the interrupt key often the DELETE key or Control-Cand the quit keyoften Control-backslashare used to interrupt the currently running process. Another way to generate a signal is by calling the kill function. We can call this function from a process to send a signal to another process. Naturally, there are limitations: we have to be the owner of the other process (or the superuser) to be able to send it a signal.

Example

Recall the bare-bones shell example (Figure 1.7). If we invoke this program and press the interrupt key, the process terminates because the default action for this signal, named SIGINT, is to terminate the process. The process hasn't told the kernel to do anything other than the default with this signal, so the process terminates.

To catch this signal, the program needs to call the signal function, specifying the name of the function to call when the SIGINT signal is generated. The function is named sig_int; when it's called, it just prints a message and a new prompt. Adding 11 lines to the program in Figure 1.7 gives us the version in Figure 1.10. (The 11 new lines are indicated with a plus sign at the beginning of the line.)

Figure 1.10. Read commands from standard input and execute them

#include "apue.h" #include <sys/wait.h> + static void sig_int(int); /* our signal-catching function */ + int main(void) { char buf[MAXLINE]; /* from apue.h */ pid_t pid; int status; + if (signal(SIGINT, sig_int) == SIG_ERR) + err_sys("signal error"); + printf("%% "); /* print prompt (printf requires %% to print %) */ while (fgets(buf, MAXLINE, stdin) != NULL) { if (buf[strlen(buf) - 1] == "\n") buf[strlen(buf) - 1] = 0; /* replace newline with null */ if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* child */ execlp(buf, buf, (char *)0); err_ret("couldn't execute: %s", buf); exit(127); } /* parent */ if ((pid = waitpid(pid, &status, 0)) < 0) err_sys("waitpid error"); printf("%% "); } exit(0); } + + void + sig_int(int signo) + { + printf("interrupt\n%% "); + }

TIME VALUES

Historically, UNIX systems have maintained two different time values:

Calendar time. This value counts the number of seconds since the Epoch: 00:00:00 January 1, 1970, Coordinated Universal Time (UTC). (Older manuals refer to UTC as Greenwich Mean Time.) These time values are used to record the time when a file was last modified, for example.

The primitive system data type time_t holds these time values.

Process time. This is also called CPU time and measures the central processor resources used by a process. Process time is measured in clock ticks, which have historically been 50, 60, or 100 ticks per second.

The primitive system data type clock_t holds these time values.

When we measure the execution time of a process, we'll see that the UNIX System maintains three values for a process:

Clock time

User CPU time

System CPU time

The clock time, sometimes called wall clock time, is the amount of time the process takes to run, and its value depends on the number of other processes being run on the system. Whenever we report the clock time, the measurements are made with no other activities on the system.

The user CPU time is the CPU time attributed to user instructions. The system CPU time is the CPU time attributed to the kernel when it executes on behalf of the process. For example, whenever a process executes a system service, such as read or write, the time spent within the kernel performing that system service is charged to the process. The sum of user CPU time and system CPU time is often called the CPU time.

It is easy to measure the clock time, user time, and system time of any process: simply execute the time(1) command, with the argument to the time command being the command we want to measure. For example:

$ cd /usr/include $ time -p grep _POSIX_SOURCE */*.h > /dev/null real 0m0.81s user 0m0.11s sys 0m0.07s

The output format from the time command depends on the shell being used, because some shells don't run /usr/bin/time, but instead have a separate built-in function to measure the time it takes commands to run.

SYSTEM CALLS AND LIBRARY FUNCTIONS

All operating systems provide service points through which programs request services from the kernel. All implementations of the UNIX System provide a well-defined, limited number of entry points directly into the kernel called system calls (recall Figure 1.1). Version 7 of the Research UNIX System provided about 50 system calls, 4.4BSD provided about 110, and SVR4 had around 120. Linux has anywhere between 240 and 260 system calls, depending on the version. FreeBSD has around 320.

The system call interface has always been documented in Section 2 of the UNIX Programmer's Manual. Its definition is in the C language, regardless of the actual implementation technique used on any given system to invoke a system call. This differs from many older operating systems, which traditionally defined the kernel entry points in the assembler language of the machine.

The technique used on UNIX systems is for each system call to have a function of the same name in the standard C library. The user process calls this function, using the standard C calling sequence. This function then invokes the appropriate kernel service, using whatever technique is required on the system. For example, the function may put one or more of the C arguments into general registers and then execute some machine instruction that generates a software interrupt in the kernel. For our purposes, we can consider the system calls as being C functions.

Section 3 of the UNIX Programmer's Manual defines the general-purpose functions available to programmers. These functions aren't entry points into the kernel, although they may invoke one or more of the kernel's system calls. For example, the printf function may use the write system call to output a string, but the strcpy (copy a string) and atoi (convert ASCII to integer) functions don't involve the kernel at all.

From an implementor's point of view, the distinction between a system call and a library function is fundamental. But from a user's perspective, the difference is not as critical. From our perspective in this text, both system calls and library functions appear as normal C functions. Both exist to provide services for application programs. We should realize, however, that we can replace the library functions, if desired, whereas the system calls usually cannot be replaced.

Consider the memory allocation function malloc as an example. There are many ways to do memory allocation and its associated garbage collection (best fit, first fit, and so on). No single technique is optimal for all programs. The UNIX system call that handles memory allocation, sbrk(2), is not a general-purpose memory manager. It increases or decreases the address space of the process by a specified number of bytes. How that space is managed is up to the process. The memory allocation function, malloc(3), implements one particular type of allocation. If we don't like its operation, we can define our own malloc function, which will probably use the sbrk system call. In fact, numerous software packages implement their own memory allocation algorithms with the sbrk system call. Figure 1.11 shows the relationship between the application, the malloc function, and the sbrk system call.

Figure 1.11. Separation of malloc function and sbrk system call

Here we have a clean separation of duties: the system call in the kernel allocates an additional chunk of space on behalf of the process. The malloc library function manages this space from user level.

Another example to illustrate the difference between a system call and a library function is the interface the UNIX System provides to determine the current time and date. Some operating systems provide one system call to return the time and another to return the date. Any special handling, such as the switch to or from daylight saving time, is handled by the kernel or requires human intervention. The UNIX System, on the other hand, provides a single system call that returns the number of seconds since the Epoch: midnight, January 1, 1970, Coordinated Universal Time. Any interpretation of this value, such as converting it to a human-readable time and date using the local time zone, is left to the user process. The standard C library provides routines to handle most cases. These library routines handle such details as the various algorithms for daylight saving time.

An application can call either a system call or a library routine. Also realize that many library routines invoke a system call. This is shown in Figure 1.12.

Figure 1.12. Difference between C library functions and system calls

Another difference between system calls and library functions is that system calls usually provide a minimal interface, whereas library functions often provide more elaborate functionality. We've seen this already in the difference between the sbrk system call and the malloc library function.

The process control system calls (fork, exec, and wait) are usually invoked by the user's application code directly. (Recall the bare-bones shell in Figure 1.7.) But some library routines exist to simplify certain common cases: the system and popen library routines, for example.

To define the interface to the UNIX System that most programmers use, we have to describe both the system calls and some of the library functions. If we described only the sbrk system call, for example, we would skip the more programmer-friendly malloc library function that many applications use. In this text, we'll use the term function to refer to both system calls and library functions, except when the distinction is necessary.

SUMMARY

This has been a short tour of the UNIX System. We've described some of the fundamental terms that we'll encounter over and over again. We've seen numerous small examples of UNIX programs to give us a feel for what the remainder of the text talks about.

0 notes

atcomm · 14 years ago

Text

Five Things To Do To Defend Agaisnt Duqu

Protect your infrastructure from Duqu in the interim, just in case -- with a new "hot fix" released by Microsoft, among other precautions

Whether Duqu is related to Stuxnet's authors or its source code is the least of your worries if your organization ends up in the bull's eye of this new targeted attack. Microsoft says it considers the threat "low risk" at this point. Trouble is, the names of the organizations that have been targeted thus far have been kept confidential, so we don't know just what Duqu is after exactly, and whether it's focused on a particular industry or region.

"I don't expect Duqu to stop. It looks to be manned on the inside and not on autopilot -- they are actively setting up new modules, etc., to keep the operation alive," says Don Jackson, a director with Dell Secureworks Counter Threat Unit. "So [right now] it's an intelligence game."

Even so, there are still some things organizations can do to protect themselves while the world waits for more information on this attack, as well as for Microsoft's patch for the zero-day flaw that was exploited and used with Word to spread the infection. Microsoft late today issued a "hot fix" along with an advisory about Duqu and assured users that antivirus vendors in its MAPP program would soon be updating their products with Duqu signatures very soon.

Even if you're not a certificate authority or manufacturing firm -- the two industries cited publicly so far as having Duqu victims -- security experts say there are some steps you can take to help protect your infrastructure from this new targeted attack.

1. Install the just-released "hot fix" from Microsoft and workaround. Microsoft is working on a patch, and it will do so via its regular security bulletin release -- just not in time for next week's batch. So in the meantime, Microsoft today began offering a hot fix for the threat that blocks access to t2embed.dll used in the zero-day attack in Duqu.

The flaw lays in the Win32k TrueType font parsing engine, according to Microsoft: "An attacker who successfully exploited this vulnerability could run arbitrary code in kernel mode. The attacker could then install programs; view, change, or delete data; or create new accounts with full user rights. We are aware of targeted attacks that try to use the reported vulnerability; overall, we see low customer impact at this time. This vulnerability is related to the Duqu malware," Microsoft said in an advisory today.

Jerry Bryant, group manager for response communications in Microsoft's Trustworthy Computing Group, says Microsoft is closely monitoring further developments with Duqu. "As previously stated, the risk for customers remains low. However, that is subject to change, so we encourage customers to either apply the workaround or ensure their anti-malware vendor has added new signatures based on the information we’ve provided them to ensure protections are in place for this issue," he says.

2. Run updated anti-malware -- and use standard security best practices. Not all antivirus products can detect Duqu yet, but security experts say to keep updating to be sure you get protection for Duqu as soon as it's released.

"Detections related to Duqu are mapped to the W32.Duqu family of signatures. We also highly encourage people not to click on attachments in email that seems suspicious, even if it comes from someone they know," says Kevin Haley, director of product management for Symantec.

Secureworks recommends using any host-based protection in addition to the typical network monitoring and user access controls that would help thwart Duqu. Tarek Saadawi, professor of electrical engineering at The City College of New York’s Grove School of Engineering, says because Duqu sniffs keyboard strokes and tries to steal passwords to internal systems, users should also protect their home computers and networks. Aside from updating AV and Windows, be sure to update third-party applications and shut down computers at night, he says.

3. Scan or filter Word documents from unknown sources. One handy tool here is Microsoft's MOICE (Microsoft Office Isolated Conversion Environment), which checks for malformed Word documents, Secureworks' Jackson says. "That's how Duqu starts: with a malformed Word file. It's playing a trick on Microsoft Word to run this code," he says.

Jackson suggests filtering Word documents from unknown sources and scanning them with MOICE until there's a patch for the new zero-day attack. Another option is to use something like FireEye's software: "FireEye loads the Word document inside the VM and [executes] malicious detection," he says.

4. Monitor for traffic from potentially infected machines trying to "phone home" to Duqu. Be on the lookout for machines trying to connect to a Duqu command-and-control (C&C) server or trying to resolve to a Duqu-related domain. Two C&C servers have been taken down thus far, but there are likely new ones. The IP addresses of the C&Cs that were found and ultimately shuttered: 206.183.111.97 and 77.241.93.160.

"I'm confident that there are other command-and-control servers either going up now or that are already up," Jackson says. "We are a step behind them in spotting new ones.

"Duqu has a stay-alive module … and has the ability to change itself, so anything you can do to block IP addresses will help," he says.

5. Watch for any Port 443 traffic that's unencrypted, and keep an eye out for ~DQ files. Watching for unencrypted traffic on the HTTP-S or SSL-based traffic port can help detect malware, including a possible Duqu infection. "If it's not encrypted [traffic there], it's probably bad," says Secureworks' Jackson.

Meanwhile, a Duqu-infected file may start with "~DQ" in the Windows temporary file directory, so be on the lookout for that as well, Secureworks recommends.

#windows #microsoft #virus #duqu #atcomm

16 notes · View notes

atcomm · 14 years ago

Photo

Samsung Transparent LCD Panels: We see through them

The shopping experience is getting more "Minority Report"-ish by the day, what with innovations like digital billboards that know your age and gender and serve up ads accordingly.

Now Samsung is looking to buy into the future-of-retail space with a transparent LCD panel that can be used to dramatically enhance kiosks, store windows, and billboards with text and images that do fancy tricks like rotate and fade in and out rather than just sit there. Think bus shelters with see-through walls displaying scrolling schedules and clothing shop windows that feature models sashaying down the catwalk.

Yesterday in San Francisco, the company showed CNET a 22-inch transparent LCD panel built into a display case housing a Samsung Galaxy Tab. That size panel has already gone into production, with a 46-incher on the way.

During the demo, the 7-inch Tab sat behind a window-like facade programmed to display text and images spotlighting some of the product's specs and social-networking capabilities. Bill Beaton, senior manager of LCD marketing for Samsung, even did a little hand dance behind the rotating imagery to demonstrate the panel's high transparency rate.

The panels come in black-and-white and color versions, and have a contrast ratio of 500:1 with WSXGA+ (1,680x1,050) resolution. They are HDMI- and USB-enabled and utilize ambient light such as sun light, thus reducing their dependence on electricity.

This is not Samsung's first stab at transparency, as its IceTouch YP-H1 and laptop sport a transparent touch screen. Other companies, including LG, Sony Ericsson and Korean materials maker NeoView Kolon have ventured into transparent-technology territory, as well.

Samsung mostly views its transparent panels as a tool to make advertising more dynamic (and says it's already working with unnamed major retail partners interested in using them that way). But it also imagines them as potential interactive communication devices for corporations and schools.

Beyond those applications, Beaton showed a cool image of an office building made multicolored by smart windows using the transparent panels. That's just a concept at this point. Still, we're suddenly imagining "mood houses" that sport yellow windows when inhabitants feel happy, and blue windows when they're bummed.

#transparent LCD #LCD #display #mobile #phone #technology #samsung #android #atcomm

10 notes · View notes

atcomm · 14 years ago

Text

Debugging Applications

Every significant piece of software will contain defects, typically two to five per 100 lines of code. These mistakes lead to programs and libraries that don’t perform as required, often causing a program to behave differently than it’s supposed to. Bug tracking, identification, and removal can consume a large amount of a programmer’s time during software development.

Types of Errors:

Specification Errors: If a program is incorrectly specified, it will inevitably fail to perform as required. Even the best programmer in the world can sometimes write the wrong program. Before you start programming (or designing), make sure that you know and understand clearly what your program needs to do. You can detect and remove many (if not all) specification errors by reviewing the requirements and agreeing that they are correct withthose who will use the program.

Design Errors: Programs of any size need to be designed before they’re created. It’s not usually enough to sit down at a computer keyboard, type source code directly, and expect the program to work the first time. Take time to think about how you will construct the program, what data structures you’ll need, and how they will be used. Try to work out the details in advance, because it can save many rewrites later on.

Coding Errors: Of course, everyone makes typing errors. Creating the source code from your design is an imperfect process. This is where many bugs will creep in. When you’re faced with a bug in a program, don’t overlook the possibility of simply rereading the source code or asking someone else to. It’s surprising just how many bugs you can detect and remove by talking through the implementation with someone else.

Try executing the core of the program on paper, a process sometimes called dry running. For the most important routines, write down the values of inputs and calculate the outputs step by step. You don’t always have to use a computer to debug, and sometimes it can be the computer causing the problems. Even the people who write libraries, compilers, and operating systems make mistakes! On the other hand, don’t be too quick to blame the tools; it is more likely that there’s a bug in a new program than in the compiler.

Five Stages of Debugging:

Testing: Finding out what defects of bugs exist

Stabilisation: Making the bugs reproducable

Localisation: Identifying the line(s) of code responsible

Correction: Fixing the Code

Verification: Making sure the fix works

#linux #coding #debugging #development #atcomm

66 notes · View notes

atcomm · 14 years ago

Link

#web #internet #browser #chrome #firefox #safari #internet explorer #opera #atcomm

15 notes · View notes

atcomm · 14 years ago

Text

The Future of OpenGL

Over the lifetime of OpenGL, many features have been added to improve the performance of rendering complex scenes. Unfortunately, the API has become very large, with a whole array of options to choose from for each particular rendering task. Such a large number of options makes it very difficult to determine which technique will perform the most efficiently; in other words, it has became difficult to find what’s known as the ‘‘fast path.’’ The introduction of the deprecation model in OpenGL 3.0 promises to rectify this situation. The majority of the API (specifically the fixed-function part) has been marked as deprecated; the remaining functions provide the fastest methods of rendering.

Here is a list of the main functionality deprecated in Version 3.0. If you have some previous knowledge of OpenGL, this list may be of interest to you.

Color Index mode

OpenGL shading language versions 1.10 and 1.20 (now replaced with 1.30)

Immediate mode

Fixed-function vertex processing

Matrix stacks

Client vertex arrays

Rectangles

Raster position

Non-sprite points

Wide lines and line stipple

Quadrilateral and polygon primitives

Separate polygon drawing mode

Polygon stipple

Pixel drawing

Bitmaps

Texture wrap mode—GL_CLAMP

Display lists

The selection buffer

The accumulation buffer

Alpha test

Attribute stacks

Evaluators

Unified extention string

It’s quite a lot isn’t it! Most of the above functionality is now implemented using shaders, and some parts (such as the matrix stack) can be implemented in separate libraries. There are a few items listed above that are no longer relevant because they have been replaced by more efficient methods (e.g., display lists), and some have been removed because they don’t really belong in a rendering API (e.g., the selection buffer). By the end of this chapter, you will understand how to render using future-proof, non-deprecated functionality by replacing the fixedfunction features we have relied on so far (vertex arrays and the matrix stack) with new techniques using shaders.

#openGL #coding #atcomm

18 notes · View notes

atcomm · 14 years ago

Link

In this tutorial, I describe various common graphic design elements in modern web (“2.0″) design style.

I then attempt to explain why they work (i.e. why they have become common), as well as how, when and where you might use each element in your designs.

It follows on from my Current Style article, and analyses in greater depth the design features of the current “Web 2.0″ design style.

To learn how to design Web2.0 sites yourself, you must read “Save the Pixel – The Art of Simple Web Design”, which is a comprehensive guidebook to the principles and techniques of Web2.0 design. Read more >>

#web #design #web 2.0 #html5 #atcomm

0 notes

atcomm · 14 years ago

Text

Adobe Launching Six New Creativity Apps for Tablets

There are as many as six new apps being developed at the house of Adobe, all of which are aimed at letting users create and edit contents on their tablet devices. The announcement was made at the annual developer conference MAX 2011 held in Los Angeles. The apps will be equally functional in both the iPad as well as the Android based tablet PCs. However, Adobe has yet to offer any concrete release date for the apps via the iPad, with the earliest release date being pegged at around early 2012. For the Android based tablet PCs, however, the release date should be much earlier around next month.

The apps are part of Adobe’s new cloud based initiative, aptly termed “Adobe Creative Cloud,” which is being developed as a means of accessing apps relevant for desktop and tablet devices or for finding essential creative services. Of course, the cloud based initiative is also designed to allow sharing among various users. Adobe has also gone on to acquire two firms, Nitobi Software and Typekit, to ensure easy implementation of the new apps, though the financial aspects of the deal has been kept under wraps. Of the two firms, Nitobi Software offers the expertise that allows quicker development of mobile apps while Typekit offers a library of type fonts that are cloud based and are used by artists. Adobe has priced the apps at $9.99 each.

The six apps are:

Adobe Photoshop Touch: allows for editing of image files with core Photoshop features that has been optimized for operation in a tablet environment. Adobe Collage: advanced drawing tool that allows importing of photos and transform into “modern, conceptual mood boards.” Adobe Debut: enables users to showcase their designs and creations in a tablet device almost anywhere. Adobe Ideas: a vector based drawing tool that is receptive to both stylus and finger inputs. Adobe Kuler: allows to generate color themes with hundreds of thousands of color themes available via the creative community. Adobe Proto: allows users to develop interactive wireframes and prototypes for application in websites and mobile apps on a tablet.

All the above mentioned apps are to be available for the iPad in early 2012. The sole exception is Adobe Ideas, which can already be purchased from the Apple App Store for $5.99.

Read more: goodreader.com

#android #ipad #ios #apple #google #tablet #adobe #photoshop #collage #debut #ideas #kuler #proto #creative suite #atcomm

6 notes · View notes

atcomm · 14 years ago

Text

The Future of HTML

HTML Timeline

In the early ’90’s, Tim Berners-Lee conceived HTML, but there was no formal HTML 1.0 specification written and, despite the similarities in syntax, it was not formally based on SGML.

Work continued over the next few years and in 1995, HTML 2.0 was published as RFC 1866 which formally defined HTML as an application of SGML. However, browsers still didn’t bother to implement SGML parsers and, even at this early stage, many proprietary extensions were starting to appear.

From around 1996, the browser wars were in full swing. There were proprietary extensions flying in from all directions and an abundance of broken pages relying on browser bugs to work. This eventually became widely known as “Tag Soup”. In an effort to standardise this mess, the W3C published HTML 3.2 in ’97 and 4.0 in the following year which formally deprecated many of the presentational features that had crept in.

By now it seemed that the life of HTML was coming to an end and work on XHTML began. After HTML 4.01 was published at the end of ’99 to resolve a few minor issues, work on HTML as an application of SGML ceased and the HTML Working Group have been pushing ahead with XHTML ever since.

In what seemed like an effort to further distance themselves from these huge mistakes of the past, the HTML Working Group began work on XHTML 2.0 in 2002. However, it has not been designed with backwards compatibility in mind; it has been designed as a way to start over fresh with a new markup language; although many see this as a major barrier to XHTML 2.0’s chances of ever taking off.

WHATWG

Over the years, Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML and apparent disregard for the needs of real-world authors. So, in 2004, led by Ian Hickson, these organisations set out to with a mission to meet the needs of both users and developers; and the Web Hypertext Application Technology Working Group was born.

WHATWG Goals

The goals of the WHATWG include documenting existing, real-world browser behaviour; standardising widely supported and useful proprietary extensions and developing practical new features that meet the demands of both users and developers whilst ensuring backwards compatibility and defining robust error handling techniques.

The Specs

Over the past 2 years, they've been planning and working on 3 separate specifications: Web Applications 1.0, Web Forms 2.0 and Web Controls 1.0. Together, these 3 specs form what is collectively known as HTML 5.

Web Applications 1.0

The Web Apps 1.0 spec is redefining the syntax and parsing requirements of HTML to match the way existing browsers handle tag soup, introducing new document structure and semantics, and DOM APIs, many of which are designed specifically for building applications.

Web Forms 2.0

The Web Forms 2.0 spec aims to extend the HTML 4 forms with new controls, a repetition model, improved client side validation and new DOM APIs for working with forms and controls.

Web Controls 1.0

Lastly, the Web Controls 1.0 spec aims to further enhance CSS and the DOM for building customised controls and widgets. However, at present, not much work has been done in this area and so there isn't much to say about it yet.

Document Representation

HTML5 introduces the concept of serialisations for an HTML document. A serialisation in this context refers to its physical representation. HTML5 uses the HTML serialisation and XHTML5 uses the XML serialisation. Because of this, the distinction between an HTML and an XHTML document is reduced.

In most cases, either serialisation can be used to represent exactly the same document. Although they will be parsed according to different rules, browsers will create a DOM, which is simply another way of representing the document.

There are, however, some features that cannot be represented in all of these. For instance, namespaces can be used in the DOM and in the XHTML serialisation, but cannot be used in the HTML serialisation.

As a consequence, this resolves the HTML vs. XHTML debate once and for all. These days, many authors use an XHTML 1.0 DOCTYPE and then proceed to claim they’re using XHTML, but in reality, they’re using HTML because browsers make the decision about whether to treat a document as HTML or XHTML based on the MIME type.

So, unlike previous versions, the choice of using either HTML or XHTML is not dependent upon the DOCTYPE used. It is solely dependent upon the MIME type. If the document is served as text/html, it is HTML and gets parsed as such; but if it is served with an XML MIME type, like application/xhtml+xml, it is XHTML and gets parsed as XML.

Browser Support for HTML

In reality, parsing HTML is a nightmare. The web is literally filled with an infinite number of pages, growing every day, and browsers are forced to handle it all gracefully. They can't allow themselves to choke on invalid HTML, regardless of how broken it is.

The major problem is that there is a serious lack of interoperability, which is a direct result of the fact that parsing and error handling were not well defined in HTML, and most certainly not defined in a way that is compatible with the web.

There are also many proprietary extensions out there that are both widely used and supported. The problem with this is that these features aren't well-defined and browser vendors have spent years reverse engineering them from each other.

While reverse engineering has gone some way in fostering interoperability between browsers, the process is far from perfect and it would be much better if the widely used and deployed extensions could be thoroughly documented and interoperably implemented; which is exactly what the WHATWG is attempting to do.

Interoperability Issues

To illustrate the lack of interoperability, let’s take a look at a simple, yet very common. markup error and show how it is handled by different browsers. In this example, the strong and em elements have been badly nested.

In this case, Firefox and Safari produce the same result, although they use different parsing algorithms to do so. In the DOM representation, notice that there are 2 em elements in the DOM, yet only one appears in the markup. To work around the error, they’ve effectively closed the em element when its parent element closed, and created a new one immediately afterwards.

Compare this with IE, however, which, instead of creating 2 em elements, creates a broken DOM that isn’t strictly a tree. Notice how the em element has 2 child text nodes, b and c, but the text node c references the p element as its parent, rather than the em.

Lastly, Opera creates a DOM similar to that in IE, except that it is a proper tree structure. But the problem with this approach is that the text node c is a descendant of the strong element, but it is not rendered as such. By default, it is only rendered in italics, not in bold, as you would expect with this DOM.

So you can see, with just this one simple example, that browsers do handle markup differently. And keep in mind that the web is filled with an infinite number of pages, with errors far more complicated than that.

HTML 5 Parsing (text/html only)

The WHATWG is attempting to resolve this situation by thoroughly documenting and defining the parsing requirements for handling HTML. They are achieving this goal by analysing the behaviour of current browsers–primarily IE, Firefox, Opera and Safari–and defining an algorithm that will be compatible with the web, in the hope that it will be implemented by all future browsers.

To help ensure full interoperability between browsers, one of the most important issues to deal with is error handling. We can never expect all web pages to be error free, but, as users, we should always expect browsers to handle it. So the algorithm has been specced, at least in theory, to deal every possible error condition.

DOCTYPEs and DTDs

In HTML, DOCTYPEs serves 2 practical purposes: validation and DOCTYPE sniffing. These days, most standards-aware developers use either an HTML or XHTML, Strict or Transitional DOCTYPE. Since HTML 5 is no longer formally based on SGML and because DTD based validation has many limitations with respect to conformance checking, HTML 5 will no longer recommend the use of a DTD. Rather, conformance checkers will be free to use whatever methodology they like to check the document for validity and conformance, so long as the end result is the same.

The DOCTYPE

However, there is still the practical issue of triggering standards mode and some form of DOCTYPE is required for that in HTML.

In HTML 4, the DOCTYPE was long and complicated, and very few people can actually remember it all. The complex PUBLIC and SYSTEM identifers are used to refer to the DTD. But because there is no DTD in HTML5, we’ve taken out the PUBLIC and SYSTEM identifiers and left the minimal amount of code that is both easy to remember and triggers standards mode. Thus, in HTML 5, the DOCTYPE will simply be <!DOCTYPE html>.

This does not apply to XHTML 5, for which there is no DOCTYPE sniffing and no need for any DOCTYPE at all.

New Structures

These days, it’s fairly common to use div elements for the major structures on the page, such as headers, footers and columns, giving each one a descriptive id or class. But the use of divs is simply because current versions of HTML lack the necessary semantics for describing these sections.

In extreme cases, the overuse of the non-semantic div element can lead to a syndrome, which is common amongst beginners, known as either divitis or div-mania. HTML 5 is attempting to cure this condition by introducing new elements that provide the semantics for representing each of these different sections.

There are new header and footer elements, for marking up the header and footer of a page or section.

The new nav element has been introduced for marking up navigation links; either site navigation or page navigation.

The new aside element is for content that is tangentially related to the content around it, and is typically useful for marking up side bars.

The new section element represents a generic section of a document or application, such as a chapter, for example.

The article element is like section, but is specifically for marking up content such as a news article or blog entry.

When used in conjunction with the heading elements, all of these elements provide a way to mark up nested sections with heading levels, beyond the 6 levels possible with previous versions of HTML.

New Semantics

HTML 5 is also introducing many other new elements for a wide range of semantic purposes, ranging from simple metadata to cool new widgets.

The new meter element provides a widget for representing scalar measurements or fractional values. For example, you could use it to show a quality rating, disk quota usage or the current temperature.

The progress element is designed to show the completion progress of a task. It has been designed to work with scripted applications that can dynamically update the progress. For example, you could use it to show the loading progress in an Ajax application, or to illustrate the user’s progress through a series of forms.

The canvas element is designed to provide a 2D drawing API, specifically for use with scripts. It can be used to render anything from simple artwork or graphs drawn from tables of data, to fancy animations or interactive applications, such as a game. There has even been some talk of introducing a 3D drawing API.

The new datagrid element represents an interactive representation of tree, list, or tabular data and provides a widget that allows the user, and a rich DOM API for scripts, to work with the data.

There is a new time element for marking up dates and times, m for highlighting text. The revitalised menu element is back with improvements, in conjunction with the new command element, for providing toolbars and context menus.

The widely implemented, yet previously undocumented, embed element has been introduced, and the figure element provides a way for adding captions to images.

The details element can be used to represent additional information, available on request, and the new dialog element is for marking up conversations.

New Controls

Over the years, it’s become clear that the types of controls available in HTML4 are quite limited and have forced many sites to work around these limitations with varying degrees of complexity. Dates, for example, are often requested using 3 separate fields–one each for the day, month and year. Web Forms 2 has introduced a number of new controls for a wide range of additional datatypes.

There are several new controls for dates and times. This is the new widget that Opera provides for the datetime control. It provides a calendar for selecting the date and a clock for entering the time. Similar controls are also available for just the date, or just the time.

The new number control is for any numeric value. The advantage of this control is that, in this implementation, it provides a spin control for incrementing and decrementing the value, as well as ensuring that only numbers are entered. It doesn’t allow any non-numeric characters to be entered, so it’s one less thing for client side validation to worry about.

There’s also a new slider control available. It’s value is also numeric, but it’s designed for cases where the exact value is relatively unimportant. For example, it could be used as a volume control or brightness control.

The new email control is designed specifically for e-mail addresses. The advantage of this control is that browsers could provide access to the user’s address book and also verify that a valid e-mail address has been entered.

The new URL control is also available for URIs. In this example, the browser has listed some matching addresses from the user’s browsing history.

But perhaps one of the most exciting new features is the ability to finally mark up combo boxes! These allow the user to either select a value from a list, or enter a new value. Traditionally, this limitation has been worked around using separate select lists and text boxes; or simulated using various JavaScript techniques. Now, this functionality can be provided with a single control.

Repetition Model

There are often times when you need to collect an arbitrary number of values for a set of data. For example, a ticket booking form may ask you to list the names of all the people for whom you are purchasing tickets, you may need to add multiple contacts to an address book, or, as in this example, list all the members of SG-1.

In current sites, this usually requires the user to submit the form to the server, using the Add button provided on the page, and the server to respond with a new page updated with additional rows. With this new model, the addition and removal of rows can be handled entirely on the client side.

Web Forms 2.0 has introduced new template features and buttons for replicating form controls. The add button can be used to add a new set of controls. In this example, a fourth set of fields for name and rank have been added and filled out.

Values can also be easily removed using the Remove button. When the user has completed the form, it can be submitted just like any other, with a regular submit button.

The way it works is by marking up a template in the page. Almost any element can be used as a template; you are not restricted to using table rows, as in this example. The new repeat attribute indicates that the element and its content is a template that can be replicated.

The repeat-start attribute indicates how many copies of the template should be generated when the page loads. In this example, 2 rows will be generated.

When a template is replicated, a few things need to occur. The repeat attribute is given a unique index and the repeat-template attribute is used to refer to ID of the template from which it was created.

Also notice the name attributes in the template row at the end. The use of square brackets is a special syntax that needs to refer to the template’s ID that gets replaced with the value of the repetition index. In this way, it can be used as a way to ensure that each control has a unique name for sending to the server.

For removing repetition blocks, a new remove button has been defined. When activated, it causes its nearest ancestor repetition block to be removed.

Similarly, for adding new repetition blocks, a new add button is available. When it is activated, it generates a new repetition block from the template and inserts it into the page.

Client Side Form Validation

On most sites, it’s common to find some form of client side validation designed to assist the user with completing a form, implemented using JavaScript. One of the biggest limitations with client side validation is that current versions of HTML lack the necessary means to describe a form control for any kind of data validation, and so this is usually accomplished entirely by using scripts.

HTML5 has introduced some new attributes on form controls for describing the expected value to enable the browser to assist with the validation. The new required attribute can be used to indicate that a value is required.

Regular Expressions, which are typically embedded in form validation scripts, can now be used with the new pattern attribute for describing the exact format allowed. For instance, you could use this pattern for a username field to restrict it to alphanumeric characters only.

For numeric controls, such as number and range, it will be possible to restrict the allowable values to be within a certain range using the min and max attributes.

And, although this has been available on text boxes since the beginning, maxlength can now be specified on textareas too.

Browsers that support these features can notify the user of any mistakes and automatically prevent submission until they are corrected; or they can be used in conjunction with scripts to enhance the user experience.

DOM APIs

Along with the new markup features that have been introduced, HTML5 is also including many new features in the DOM. The DOM is a browsers internal representation of the page and the APIs are provided to allow scripts to work with it.

There are many widely supported APIs in browsers that were previously undocumented; known as DOM level 0. These include interfaces like Window, History, Location; and the many widely supported and used methods that aren’t defined in a current DOM spec.

In recognising the fact that these APIs are both widely used and supported, it is considered far better to document, standardise, and improve them where possible, so that they can become interoperably implemented.

Along with these, there are also many new features that are being developed. The client-side storage APIs are designed to allow scripts to store data on the client side. In a way, they are similar to cookies, but with a much richer API and enhancements.

The new Audio interface is being designed for playing small sound effects.

There are several new communication APIs, including server-sent events, which allows a page to receive notifications from the server when an event occurs. For example, it could be used for a stock ticker to be updated with new values as they change.

The network connection APIs are being designed to allow scripts to make TCP connections directly with a server. This is similar to XMLHttpRequest, but you are not restricted to just HTTP requests.

And finally, the cross-document messaging APIs are designed to allow one document to communicate with another, without the hassle of cross-domain security issues.

#web #code #html #sgml #xhtml #apple #mozilla #opera #internet #WHATWG #web applications #web forms #web controls #HTML 5 #firefox #internet explorer #atcomm

17 notes · View notes

atcomm · 14 years ago

Text

UNIX Systems

All operating systems provide services for programs they run. Typical services include executing a new program, opening a file, reading a file, allocating a region of memory, getting the current time of day, and so on. The focus of this text is to describe the services provided by various versions of the UNIX operating system.

Describing the UNIX System in a strictly linear fashion, without any forward references to terms that haven't been described yet, is nearly impossible (and would probably be boring). This chapter provides a whirlwind tour of the UNIX System from a programmer's perspective. We'll give some brief descriptions and examples of terms and concepts that appear throughout the text. We describe these features in much more detail in later chapters. This chapter also provides an introduction and overview of the services provided by the UNIX System, for programmers new to this environment.

UNIX ARCHITECTURE

In a strict sense, an operating system can be defined as the software that controls the hardware resources of the computer and provides an environment under which programs can run. Generally, we call this software the kernel, since it is relatively small and resides at the core of the environment. Figure 1.1 shows a diagram of the UNIX System architecture.

Figure 1.1. Architecture of the UNIX operating system

The interface to the kernel is a layer of software called the system calls (the shaded portion in Figure 1.1). Libraries of common functions are built on top of the system call interface, but applications are free to use both. The shell is a special application that provides an interface for running other applications.

In a broad sense, an operating system is the kernel and all the other software that makes a computer useful and gives the computer its personality. This other software includes system utilities, applications, shells, libraries of common functions, and so on.

For example, Linux is the kernel used by the GNU operating system. Some people refer to this as the GNU/Linux operating system, but it is more commonly referred to as simply Linux. Although this usage may not be correct in a strict sense, it is understandable, given the dual meaning of the phrase operating system. (It also has the advantage of being more succinct.)

LOGGING IN

When we log in to a UNIX system, we enter our login name, followed by our password. The system then looks up our login name in its password file, usually the file

/etc/passwd. If we look at our entry in the password file we see that it's composed of seven colon-separated fields: the login name, encrypted password, numeric user ID (205), numeric group ID (105), a comment field, home directory (/home/sar), and shell program (/bin/ksh).

sar:x:205:105:Stephen Rago:/home/sar:/bin/ksh

All contemporary systems have moved the encrypted password to a different file.

Shells

Once we log in, some system information messages are typically displayed, and then we can type commands to the shell program. (Some systems start a window management program when you log in, but you generally end up with a shell running in one of the windows.) A shell is a command-line interpreter that reads user input and executes commands. The user input to a shell is normally from the terminal (an interactive shell) or sometimes from a file (called a shell script). The common shells in use are summarized in Figure 1.2.

Figure 1.2. Common shells used on UNIX systems

Name

Path

FreeBSD 5.2.1

Linux 2.4.22

Mac OS X 10.3

Solaris 9

Bourne shell

/bin/sh

•

link to bash

•

Bourne-again shell

/bin/bash

optional

•

C shell

/bin/csh

link to tcsh

•

Korn shell

/bin/ksh

•

TENEX C shell

/bin/tcsh

The system knows which shell to execute for us from the final field in our entry in the password file.

The Bourne shell, developed by Steve Bourne at Bell Labs, has been in use since Version 7 and is provided with almost every UNIX system in existence. The control-flow constructs of the Bourne shell are reminiscent of Algol 68.

The C shell, developed by Bill Joy at Berkeley, is provided with all the BSD releases. Additionally, the C shell was provided by AT&T with System V/386 Release 3.2 and is also in System V Release 4 (SVR4). (We'll have more to say about these different versions of the UNIX System in the next chapter.) The C shell was built on the 6th Edition shell, not the Bourne shell. Its control flow looks more like the C language, and it supports additional features that weren't provided by the Bourne shell: job control, a history mechanism, and command line editing.

The Korn shell is considered a successor to the Bourne shell and was first provided with SVR4. The Korn shell, developed by David Korn at Bell Labs, runs on most UNIX systems, but before SVR4 was usually an extra-cost add-on, so it is not as widespread as the other two shells. It is upward compatible with the Bourne shell and includes those features that made the C shell popular: job control, command line editing, and so on.

The Bourne-again shell is the GNU shell provided with all Linux systems. It was designed to be POSIX-conformant, while still remaining compatible with the Bourne shell. It supports features from both the C shell and the Korn shell.

The TENEX C shell is an enhanced version of the C shell. It borrows several features, such as command completion, from the TENEX operating system (developed in 1972 at Bolt Beranek and Newman). The TENEX C shell adds many features to the C shell and is often used as a replacement for the C shell.

Linux uses the Bourne-again shell for its default shell. In fact, /bin/sh is a link to /bin/bash. The default user shell in FreeBSD and Mac OS X is the TENEX C shell, but they use the Bourne shell for their administrative shell scripts because the C shell's programming language is notoriously difficult to use. Solaris, having its heritage in both BSD and System V, provides all the shells shown in Figure 1.2. Free ports of most of the shells are available on the Internet.

Throughout the text, we will use parenthetical notes such as this to describe historical notes and to compare different implementations of the UNIX System. Often the reason for a particular implementation technique becomes clear when the historical reasons are described.

Throughout this text, we'll show interactive shell examples to execute a program that we've developed. These examples use features common to the Bourne shell, the Korn shell, and the Bourne-again shell.

FILES AND DIRECTORIES

File System

The UNIX file system is a hierarchical arrangement of directories and files. Everything starts in the directory called root whose name is the single character /.

A directory is a file that contains directory entries. Logically, we can think of each directory entry as containing a filename along with a structure of information describing the attributes of the file. The attributes of a file are such things as type of fileregular file, directorythe size of the file, the owner of the file, permissions for the filewhether other users may access this fileand when the file was last modified. The stat and fstat functions return a structure of information containing all the attributes of a file.

We make a distinction between the logical view of a directory entry and the way it is actually stored on disk. Most implementations of UNIX file systems don't store attributes in the directory entries themselves, because of the difficulty of keeping them in synch when a file has multiple hard links.

Filename

The names in a directory are called filenames. The only two characters that cannot appear in a filename are the slash character (/) and the null character. The slash separates the filenames that form a pathname (described next) and the null character terminates a pathname. Nevertheless, it's good practice to restrict the characters in a filename to a subset of the normal printing characters. (We restrict the characters because if we use some of the shell's special characters in the filename, we have to use the shell's quoting mechanism to reference the filename, and this can get complicated.)

Two filenames are automatically created whenever a new directory is created: . (called dot) and .. (called dot-dot). Dot refers to the current directory, and dot-dot refers to the parent directory. In the root directory, dot-dot is the same as dot.

The Research UNIX System and some older UNIX System V file systems restricted a filename to 14 characters. BSD versions extended this limit to 255 characters. Today, almost all commercial UNIX file systems support at least 255-character filenames.

Pathname

A sequence of one or more filenames, separated by slashes and optionally starting with a slash, forms a pathname. A pathname that begins with a slash is called an absolute pathname; otherwise, it's called a relative pathname. Relative pathnames refer to files relative to the current directory. The name for the root of the file system (/) is a special-case absolute pathname that has no filename component.

Example

Listing the names of all the files in a directory is not difficult. Figure 1.3 shows a bare-bones implementation of the ls(1) command.

The notation ls(1) is the normal way to reference a particular entry in the UNIX system manuals. It refers to the entry for ls in Section 1. The sections are normally numbered 1 through 8, and all the entries within each section are arranged alphabetically. Throughout this text, we assume that you have a copy of the manuals for your UNIX system.

Historically, UNIX systems lumped all eight sections together into what was called the UNIX Programmer's Manual. As the page count increased, the trend changed to distributing the sections among separate manuals: one for users, one for programmers, and one for system administrators, for example.

Some UNIX systems further divide the manual pages within a given section, using an uppercase letter. For example, all the standard input/output (I/O) functions in AT&T [1990e] are indicated as being in Section 3S, as in fopen(3S). Other systems have replaced the numeric sections with alphabetic ones, such as C for commands.

Today, most manuals are distributed in electronic form. If your manuals are online, the way to see the manual pages for the ls command would be something like

man 1 ls

man -s1 ls

Figure 1.3 is a program that just prints the name of every file in a directory, and nothing else. If the source file is named myls.c, we compile it into the default a.out executable file by

cc myls.c

Historically, cc(1) is the C compiler. On systems with the GNU C compilation system, the C compiler is gcc(1). Here, cc is often linked to gcc.

Some sample output is

$ ./a.out /dev . .. console tty mem kmem null mouse stdin stdout stderr zero many more lines that aren't shown cdrom $ ./a.out /var/spool/cron can't open /var/spool/cron: Permission denied $ ./a.out /dev/tty can't open /dev/tty: Not a directory

Throughout this text, we'll show commands that we run and the resulting output in this fashion: Characters that we type are shown in this font, whereas output from programs is shown like this. If we need to add comments to this output, we'll show the comments in italics. The dollar sign that precedes our input is the prompt that is printed by the shell. We'll always show the shell prompt as a dollar sign.

Note that the directory listing is not in alphabetical order. The ls command sorts the names before printing them.

There are many details to consider in this 20-line program.

First, we include a header of our own: apue.h. We include this header in almost every program in this text. This header includes some standard system headers and defines numerous constants and function prototypes that we use throughout the examples in the text.

The declaration of the main function uses the style supported by the ISO C standard. (We'll have more to say about the ISO C standard in the next chapter.)

We take an argument from the command line, argv[1], as the name of the directory to list.

Because the actual format of directory entries varies from one UNIX system to another, we use the functions opendir, readdir, and closedir to manipulate the directory.

The opendir function returns a pointer to a DIR structure, and we pass this pointer to the readdir function. We don't care what's in the DIR structure. We then call readdir in a loop, to read each directory entry. The readdir function returns a pointer to a dirent structure or, when it's finished with the directory, a null pointer. All we examine in the dirent structure is the name of each directory entry (d_name). Using this name, we could then call the stat function to determine all the attributes of the file.

We call two functions of our own to handle the errors: err_sys and err_quit. We can see from the preceding output that the err_sys function prints an informative message describing what type of error was encountered ("Permission denied" or "Not a directory").

When the program is done, it calls the function exit with an argument of 0. The function exit terminates a program. By convention, an argument of 0 means OK, and an argument between 1 and 255 means that an error occurred.

Figure 1.3. List all the files in a directory

#include "apue.h" #include <dirent.h> int main(int argc, char *argv[]) { DIR *dp; struct dirent *dirp; if (argc != 2) err_quit("usage: ls directory_name"); if ((dp = opendir(argv[1])) == NULL) err_sys("can't open %s", argv[1]); while ((dirp = readdir(dp)) != NULL) printf("%s\n", dirp->d_name); closedir(dp); exit(0); }

Working Directory

Every process has a working directory, sometimes called the current working directory. This is the directory from which all relative pathnames are interpreted. A process can change its working directory with the chdir function.

For example, the relative pathname doc/memo/joe refers to the file or directory joe, in the directory memo, in the directory doc, which must be a directory within the working directory. From looking just at this pathname, we know that both doc and memo have to be directories, but we can't tell whether joe is a file or a directory. The pathname /usr/lib/lint is an absolute pathname that refers to the file or directory lint in the directory lib, in the directory usr, which is in the root directory.

Home Directory

When we log in, the working directory is set to our home directory. Our home directory is obtained from our entry in the password file.

INPUT AND OUTPUT

File Descriptors

File descriptors are normally small non-negative integers that the kernel uses to identify the files being accessed by a particular process. Whenever it opens an existing file or creates a new file, the kernel returns a file descriptor that we use when we want to read or write the file.

Standard Input, Standard Output, and Standard Error

By convention, all shells open three descriptors whenever a new program is run: standard input, standard output, and standard error. If nothing special is done, as in the simple command

then all three are connected to the terminal. Most shells provide a way to redirect any or all of these three descriptors to any file. For example,

ls > file.list

executes the ls command with its standard output redirected to the file named file.list.

Unbuffered I/O

Unbuffered I/O is provided by the functions open, read, write, lseek, and close. These functions all work with file descriptors.

Example

If we're willing to read from the standard input and write to the standard output, then the program in Figure 1.4 copies any regular file on a UNIX system.

The <unistd.h> header, included by apue.h, and the two constants STDIN_FILENO and STDOUT_FILENO are part of the POSIX standard (about which we'll have a lot more to say in the next chapter). In this header are function prototypes for many of the UNIX system services, such as the read and write functions that we call.

The constants STDIN_FILENO and STDOUT_FILENO are defined in <unistd.h> and specify the file descriptors for standard input and standard output. These values are typically 0 and 1, respectively, but we'll use the new names for portability.

The read function returns the number of bytes that are read, and this value is used as the number of bytes to write. When the end of the input file is encountered, read returns 0 and the program stops. If a read error occurs, read returns -1. Most of the system functions return 1 when an error occurs.

If we compile the program into the standard name (a.out) and execute it as

./a.out > data

standard input is the terminal, standard output is redirected to the file data, and standard error is also the terminal. If this output file doesn't exist, the shell creates it by default. The program copies lines that we type to the standard output until we type the end-of-file character (usually Control-D).

If we run

./a.out < infile > outfile

then the file named infile will be copied to the file named outfile.

Figure 1.4. List all the files in a directory

#include "apue.h" #define BUFFSIZE 4096 int main(void) { int n; char buf[BUFFSIZE]; while ((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0) if (write(STDOUT_FILENO, buf, n) != n) err_sys("write error"); if (n < 0) err_sys("read error"); exit(0); }

Standard I/O

The standard I/O functions provide a buffered interface to the unbuffered I/O functions. Using standard I/O prevents us from having to worry about choosing optimal buffer sizes, such as the BUFFSIZE constant in Figure 1.4. Another advantage of using the standard I/O functions is that they simplify dealing with lines of input (a common occurrence in UNIX applications). The fgets function, for example, reads an entire line. The read function, on the other hand, reads a specified number of bytes. The standard I/O library provides functions that let us control the style of buffering used by the library.

The most common standard I/O function is printf. In programs that call printf, we'll always include <stdio.h>normally by including apue.has this header contains the function prototypes for all the standard I/O functions.

Example

The program in Figure 1.5 is like the previous program that called read and write. This program copies standard input to standard output and can copy any regular file.

The function getc reads one character at a time, and this character is written by putc. After the last byte of input has been read, getc returns the constant EOF (defined in <stdio.h>). The standard I/O constants stdin and stdout are also defined in the <stdio.h> header and refer to the standard input and standard output.

Figure 1.5. Copy standard input to standard output, using standard I/O

#include "apue.h" int main(void) { int c; while ((c = getc(stdin)) != EOF) if (putc(c, stdout) == EOF) err_sys("output error"); if (ferror(stdin)) err_sys("input error"); exit(0); }

PROGRAMS AND PROCESSES

Program

A program is an executable file residing on disk in a directory. A program is read into memory and is executed by the kernel as a result of one of the six exec functions.

Processes and Process ID

An executing instance of a program is called a process, a term used on almost every page of this text. Some operating systems use the term task to refer to a program that is being executed.

The UNIX System guarantees that every process has a unique numeric identifier called the process ID. The process ID is always a non-negative integer.

Example

The program in Figure 1.6 prints its process ID.

If we compile this program into the file a.out and execute it, we have

$ ./a.out hello world from process ID 851 $ ./a.out hello world from process ID 854

When this program runs, it calls the function getpid to obtain its process ID.

Figure 1.6. Print the process ID

#include "apue.h" int main(void) { printf("hello world from process ID %d\n", getpid()); exit(0); }

Process Control

There are three primary functions for process control: fork, exec, and waitpid. (The exec function has six variants, but we often refer to them collectively as simply the exec function.)

Example

If we run this program, we get the following results. Note that our program has a different promptthe percent signto distinguish it from the shell's prompt.

Figure 1.7. Read commands from standard input and execute them

Threads and Thread IDs

ERROR HANDLING

On Linux, the error constants are listed in the errno(3) manual page.

extern int errno;

extern int *_ _errno_location(void); #define errno (*_ _errno_location())

Two functions are defined by the C standard to help with printing error messages.

#include <string.h> char *strerror(int errnum);

Returns: pointer to message string

This function maps errnum, which is typically the errno value, into an error message string and returns a pointer to the string.

The perror function produces an error message on the standard error, based on the current value of errno, and returns.

#include <stdio.h> void perror(const char *msg);

It outputs the string pointed to by msg, followed by a colon and a space, followed by the error message corresponding to the value of errno, followed by a newline.

Example

Figure 1.8 shows the use of these two error functions.

If this program is compiled into the file a.out, we have

$ ./a.out EACCES: Permission denied ./a.out: No such file or directory

prog1 < inputfile | prog2 | prog3 > outputfile

we are able to tell which of the three programs generated a particular error message.

Figure 1.8. Demonstrate strerror and perror

#include "apue.h" #include <errno.h> int main(int argc, char *argv[]) { fprintf(stderr, "EACCES: %s\n", strerror(EACCES)); errno = ENOENT; perror(argv[0]); exit(0); }

Error Recovery

USER IDENTIFICATION

User ID

Group ID

There is also a group file that maps group names into numeric group IDs. The group file is usually /etc/group.

Early UNIX systems used 16-bit integers to represent user and group IDs. Contemporary UNIX systems use 32-bit integers.

Example

The program in Figure 1.9 prints the user ID and the group ID.

We call the functions getuid and getgid to return the user ID and the group ID. Running the program yields

$ ./a.out uid = 205, gid = 105

Figure 1.9. Print user ID and group ID

#include "apue.h" int main(void) { printf("uid = %d, gid = %d\n", getuid(), getgid()); exit(0); }

Supplementary Group IDs

SIGNALS

Let the default action occur. For a divide-by-zero condition, the default is to terminate the process.

Provide a function that is called when the signal occurs (this is called "catching" the signal). By providing a function of our own, we'll know when the signal occurs and we can handle it as we wish.

Example

Figure 1.10. Read commands from standard input and execute them

TIME VALUES

Historically, UNIX systems have maintained two different time values:

The primitive system data type time_t holds these time values.

The primitive system data type clock_t holds these time values.

When we measure the execution time of a process, we'll see that the UNIX System maintains three values for a process:

Clock time

User CPU time

System CPU time

The user CPU time is the CPU time attributed to user instructions. The system CPU time is the CPU time attributed to the kernel when it executes on behalf of the process. For example, whenever a process executes a system service, such asread or write, the time spent within the kernel performing that system service is charged to the process. The sum of user CPU time and system CPU time is often called the CPU time.

$ cd /usr/include $ time -p grep _POSIX_SOURCE */*.h > /dev/null real 0m0.81s user 0m0.11s sys 0m0.07s

SYSTEM CALLS AND LIBRARY FUNCTIONS

All operating systems provide service points through which programs request services from the kernel. All implementations of the UNIX System provide a well-defined, limited number of entry points directly into the kernel called system calls(recall Figure 1.1). Version 7 of the Research UNIX System provided about 50 system calls, 4.4BSD provided about 110, and SVR4 had around 120. Linux has anywhere between 240 and 260 system calls, depending on the version. FreeBSD has around 320.

Section 3 of the UNIX Programmer's Manual defines the general-purpose functions available to programmers. These functions aren't entry points into the kernel, although they may invoke one or more of the kernel's system calls. For example, the printf function may use the write system call to output a string, but thestrcpy (copy a string) and atoi (convert ASCII to integer) functions don't involve the kernel at all.

Figure 1.11. Separation of malloc function and sbrk system call

An application can call either a system call or a library routine. Also realize that many library routines invoke a system call. This is shown in Figure 1.12.

Figure 1.12. Difference between C library functions and system calls

SUMMARY

#unix #linux #operating system #os #programming #atcomm

38 notes · View notes